The increasing complexity of gameplay mechanisms in modern video games is leading to the emergence of a wider range of ways to play games. The variety of possible play-styles needs to be anticipated by designers, through automated tests. Reinforcement Learning is a promising answer to the need of automating video game testing. To that effect one needs to train an agent to play the game, while ensuring this agent will generate the same play-styles as the players in order to give meaningful feedback to the designers. We present CARMI: a Configurable Agent with Relative Metrics as Input. An agent able to emulate the players play-styles, even on previously unseen levels. Unlike current methods it does not rely on having full trajectories, but only summary data. Moreover it only requires little human data, thus compatible with the constraints of modern video game production. This novel agent could be used to investigate behaviors and balancing during the production of a video game with a realistic amount of training time.
translated by 谷歌翻译
Modern video games are becoming richer and more complex in terms of game mechanics. This complexity allows for the emergence of a wide variety of ways to play the game across the players. From the point of view of the game designer, this means that one needs to anticipate a lot of different ways the game could be played. Machine Learning (ML) could help address this issue. More precisely, Reinforcement Learning is a promising answer to the need of automating video game testing. In this paper we present a video game environment which lets us define multiple play-styles. We then introduce CARI: a Configurable Agent with Reward as Input. An agent able to simulate a wide continuum range of play-styles. It is not constrained to extreme archetypal behaviors like current methods using reward shaping. In addition it achieves this through a single training loop, instead of the usual one loop per play-style. We compare this novel training approach with the more classic reward shaping approach and conclude that CARI can also outperform the baseline on archetypes generation. This novel agent could be used to investigate behaviors and balancing during the production of a video game with a realistic amount of training time.
translated by 谷歌翻译
This report summarizes the work carried out by the authors during the Twelfth Montreal Industrial Problem Solving Workshop, held at Universit\'e de Montr\'eal in August 2022. The team tackled a problem submitted by CBC/Radio-Canada on the theme of Automatic Text Simplification (ATS).
translated by 谷歌翻译
Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\mathcal{O}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ on the required number of realizations to learn these strategies with high probability, where $H$ is the length of the game, $A_{\mathcal{X}}$ and $B_{\mathcal{Y}}$ are the total number of actions for the two players. We also propose two Follow the Regularize leader (FTRL) algorithms for this setting: Balanced-FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive-FTRL which needs $\mathcal{O}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ plays without this requirement by progressively adapting the regularization to the observations.
translated by 谷歌翻译
Line segments are ubiquitous in our human-made world and are increasingly used in vision tasks. They are complementary to feature points thanks to their spatial extent and the structural information they provide. Traditional line detectors based on the image gradient are extremely fast and accurate, but lack robustness in noisy images and challenging conditions. Their learned counterparts are more repeatable and can handle challenging images, but at the cost of a lower accuracy and a bias towards wireframe lines. We propose to combine traditional and learned approaches to get the best of both worlds: an accurate and robust line detector that can be trained in the wild without ground truth lines. Our new line segment detector, DeepLSD, processes images with a deep network to generate a line attraction field, before converting it to a surrogate image gradient magnitude and angle, which is then fed to any existing handcrafted line detector. Additionally, we propose a new optimization tool to refine line segments based on the attraction field and vanishing points. This refinement improves the accuracy of current deep detectors by a large margin. We demonstrate the performance of our method on low-level line detection metrics, as well as on several downstream tasks using multiple challenging datasets. The source code and models are available at https://github.com/cvg/DeepLSD.
translated by 谷歌翻译
We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirable to converge to such solutions. Our central insight is that careful designs of the optimization dynamics are critical to learning meaningful representations. We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse. Then in an idealized setup, we show self-predictive learning dynamics carries out spectral decomposition on the state transition matrix, effectively capturing information of the transition dynamics. Building on the theoretical insights, we propose bidirectional self-predictive learning, a novel self-predictive algorithm that learns two representations simultaneously. We examine the robustness of our theoretical insights with a number of small-scale experiments and showcase the promise of the novel representation learning algorithm with large-scale experiments.
translated by 谷歌翻译
Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous nature of diffusion models conveys many benefits, and in this work we endeavour to preserve it. We propose CDCD, a framework for modelling categorical data with diffusion models that are continuous both in time and input space. We demonstrate its efficacy on several language modelling tasks.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as is standard in language modeling. We propose Self-conditioned Embedding Diffusion, a continuous diffusion mechanism that operates on token embeddings and allows to learn flexible and scalable diffusion models for both conditional and unconditional text generation. Through qualitative and quantitative evaluation, we show that our text diffusion models generate samples comparable with those produced by standard autoregressive language models - while being in theory more efficient on accelerator hardware at inference time. Our work paves the way for scaling up diffusion models for text, similarly to autoregressive models, and for improving performance with recent refinements to continuous diffusion.
translated by 谷歌翻译
盲源分离(BSS)算法是无监督的方法,通过允许物理有意义的数据分解,它们是高光谱数据分析的基石。 BSS问题不足,解决方案需要有效的正则化方案,以更好地区分来源并产生可解释的解决方案。为此,我们研究了一种半监督的源分离方法,在这种方法中,我们将预测的交替最小二乘算法与基于学习的正则化方案结合在一起。在本文中,我们专注于通过使用生成模型来限制混合矩阵属于学习的歧管。总而言之,我们表明,这允许具有创新的BSS算法,具有提高的精度,可提供物理上可解释的解决方案。在涉及强噪声,高度相关的光谱和不平衡来源的挑战性场景中,对现实的高光谱天体物理数据进行了测试。结果突出了在减少来源之间的泄漏之前,学到的重大好处,这可以使总体上更好的分解。
translated by 谷歌翻译